In this specific study we would like to understand the effect of EPO on HSC transcriptomic identity. Specifically, CD150+ HSCs were taken from mice and incubated with EPO or No EPO (Ctrl) for 16hrs. Cells from each condition were then sequenced on the 10X Genomics platform, a droplet based approach to isolate single-cells for sequencing.
rm(list=ls())
set.seed(200)
setwd("/Users/jasoncosgrove/Dropbox (Team Perié)/Eisele et al EPO/Rcode by Jason")
#load in helper methods to perform required to perform the analysis
source("EPO_helpermethods.R")
#load in external packages that we need to complete the analysis
usePackage("Seurat")
usePackage("scran")
usePackage("org.Mm.eg.db")
usePackage("clustree")
usePackage("scater")
usePackage("SingleCellExperiment")
usePackage("limma")
usePackage("dplyr")
usePackage("scales")
usePackage("RANN")
In this step we load the datasets for EPO and control into R and convert into a Seurat object. This is an R structure that facilitates data QC and analyses such as dimensionality reduction and clustering.
SlamHSCs <- generateSeuratObject()
To assess the quality of the data we assess the library sizes, numbers of genes expressed and mitochondrial content per cell. It has been poisted that cells which have very high library sizes or relative to other cells in the data may represent doublet cells. However, in our experiments we have very low cell numbers, and thus the probabiilty of finding doublets is not very high in a droplet based sequencing experiment. In addition, we benchmarked our pipeline to find out how many DEGs overlap with known EPO signatures from the wider literature and fine higher consistency when we do not set an upper limit on library sizes.
Cells with very low library sizes are typically because of poor capture quality pontentially due to cell death, premature rupture, or capture of random mRNA escaping from cells, consequently cells with low library sizes are also filtered out from downstream analyses.
Another important QC metric is mitochondrial content. As discussed in AlJanahi et al (2018) “High numbers of mitochondrial transcripts are indicators of cell stress, and therefore cells with elevated mitochondrial gene expression are often not included in the analysis, because most experiments will not benefit from clustering cells based on stress levels.” In our system we know that upon exiting quiescence HSCs increase their mitochondrial mass to cope with increased metabolic demands. Consequently, we expect quite large spread of mitochondrial content in SLAM+ HSCs and only remove very extreme values.
#find the percentage of mitochondrial genes
SlamHSCs<- Seurat::PercentageFeatureSet(object = SlamHSCs, pattern = "^mt-", col.name = "percent.mt")
#plot key QC metrics
VlnPlot(object = SlamHSCs, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"))
SlamHSCs <- subset(SlamHSCs, subset = nFeature_RNA > 1000 & percent.mt < 12)
#lets visualise the data again to see the effects of our filtering
VlnPlot(object = SlamHSCs, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"))
As we sequence sorted SLAM+ HSCs, we expect some heterogeneity with respect to cell-cycle status. To assign a cell cycle phase to each cell, we use the cyclone method (described in Scialdone et al 2015) in the R package scran. In this scheme, a supervised learning approach was used to identify pairs of markers for each cell cycle phase. A G1 marker pair would comprise a gene with high expression in G1 relative to other phases, while the second gene would be lower in G1 relative to all other phases. To classify cell cycle phase on a new dataset, cyclone calculates the proportion of all marker pairs for which the expression of the first gene is higher than the second gene. A high proportion then suggests that the cell belongs to a given cell cycle phase.
mm.pairs <- readRDS(system.file("exdata", "mouse_cycle_markers.rds", package="scran"))
ensembl <- mapIds(org.Mm.eg.db, keys=rownames(SlamHSCs), keytype="SYMBOL", column="ENSEMBL")
assignments <- cyclone(SlamHSCs@assays$RNA@data, mm.pairs, gene.names=ensembl)
SlamHSCs@meta.data$phases <- assignments$phases
SlamHSCs@meta.data$G1_score <- assignments$normalized.scores$G1
SlamHSCs@meta.data$S_score <- assignments$normalized.scores$S
SlamHSCs@meta.data$G2M_score <- assignments$normalized.scores$G2M
table(SlamHSCs@meta.data$phases)
##
## G1 G2M S
## 2938 233 127
#see if there are any differences in cell cycle status between EPO and Control
Idents(SlamHSCs) <- SlamHSCs@meta.data$condition
counts <- table(Idents(SlamHSCs), SlamHSCs@meta.data$phases)
barplot(as.matrix(counts), legend = rownames(counts),col = hue_pal()(2))#gfpneg
prop.df <- data.frame(cbind(data.frame(prop.table(counts[,1])),data.frame(prop.table(counts[,2])),data.frame(prop.table(counts[,3]))))
colnames(prop.df) <- c("G1", "G2M", "S")
barplot(as.matrix(prop.df), legend = rownames(counts),col = hue_pal()(2))#gfpneg
Another important step we perform here is to find variably expressed genes to take forward for further analysis. To do this we use Seurats vst method. Briefly, this approach models the relationship between log mean expression and log variance using local polynomial regression. The features values are then standardized using the observed mean and predicted variance, with the final variance value calculated on the standardized values.
# FindVariableGenes calculates the average expression and dispersion for each gene, places these genes
# into bins, and then calculates a z-score for dispersion within each bin. This helps control for the
# relationship between variability and average expression.
SlamHSCs <- FindVariableFeatures(SlamHSCs, selection.method = "vst", nfeatures = 5000,
verbose = FALSE)
VariableFeaturePlot(SlamHSCs)
When analyzing sequencing data, normalization to eliminate condition effects is crucial if multiple sequencing runs are to be compared with each other. These condition effects can be caused by often unavoidable technical variations such as the duration samples were kept on ice, number of freeze-thaw cycles, method of RNA isolation, sequencing depth, etc.
An additional consideration is that droplet-based sequencing in addition consists of thousands of individual cell experiments, hence cell-specific biases must also be considered when normalizing, in order to be able to compare the expression of one cell to another. A notable cell-specific bias is caused by mRNA capture efficiency, where the mRNA molecules are not captured by the bead at the same proportion in all droplets. As individual cells are not all of the same type a key consideration is how to retain cell to cell variability while eliminating technical noise.
To normalise our data we use the default approach in seurat and also sctransform. In the default approach feature counts for each cell are divided by the total counts for that cell and multiplied by the scale.factor. This is then natural-log transformed using log1p.In the scTransform approach you take the Pearson residuals from ’regularized negative binomial regression’, where cellular sequencing depth is utilized as a covariate in a generalized linear model, to remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. In their preprint they show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates.
In our benchmarking we find higher consistency between DEGs and published EPO signatures when applying the default normalisation approach and so use this normalisation method for downstream analyses. We also run some PCA based plots to visualise the effect that each normalisation procedure is having on the data.
SlamHSCs <- NormalizeData(object = SlamHSCs)
SlamHSCs <- ScaleData(object = SlamHSCs,features = rownames(SlamHSCs))
SlamHSCs <- SCTransform(object = SlamHSCs, verbose = FALSE,variable.features.n = 5000,
conserve.memory = F,return.only.var.genes = F )
#compare these two normalisation methods through PCA based plots
sct.res <- checkNormalisation("SCT", SlamHSCs)
rna.res <- checkNormalisation("RNA", SlamHSCs)
multiplot(rna.res$p1, sct.res$p2, rna.res$p2)
#we see that the amount of variance maintained between the two methods is the same
# in benchmarking studies where we see how well the different normalisation approaches
# can affect the results of our differential expression analysis in terms of known genes
# we find that the default approach is working better
#set the default normalisation as the default for downstream analyses
SlamHSCs@active.assay <- "RNA"
#save the seurat object so we can just load it in later and we dont need to keep recomputing these steps
save(SlamHSCs,file="seurat_object.rda")
We perform dimensionality reduction on variably expressed genes using both principle component analysis, an approach to find the linear combination of genes that are the greatest source of variance in the data, and independent component analysis, a signal processing method designed to separate different signals (in this context a signal is a biological process) that are linearly mixed together. Informally, if you had a smoothie, ICA can tell you what the ingredients are.
We visualize our data using the non-linear dimensionality reduction technique UMAP. This approach is analogous to PCA, but can also identify non-linear patterns in the data. The goal of the algorithm is to learn the underlying manifold of the data in order to place similar cells together in low-dimensional space. UMAP is preferable to t-SNE as it is faster to compute, and uses a graph based approach which permits the organisation clusters into a more biologically accurate representation than t-SNE. Importantly, we use the first 10 independent components as inputs into the UMAP algorithm, this reduces noise and when compared to PCA tended to group cells by established descriptions rather than by cell cycle status, although there was still a dominant cell cycle effect.
SlamHSCs <- FindVariableFeatures(SlamHSCs, selection.method = "vst", nfeatures = 5000,
verbose = FALSE)
SlamHSCs <- RunPCA(object = SlamHSCs, verbose = FALSE,npcs = 50, features = rownames(SlamHSCs))
ElbowPlot(SlamHSCs,ndims = 50)
SlamHSCs <- RunUMAP(object = SlamHSCs, dims = 1:10, verbose = FALSE,assay= "RNA",reduction = "pca",spread = 0.25)
DimPlot(object = SlamHSCs, label = F,reduction = "umap", pt.size = 2,group.by = "condition",
cols = c("grey50", "red"))
In this part of the analysis we perform unsupervised clustering using Seurats graph based approach.
Briefly, this approach involves embedding cells in a graph structure such as a K-nearest neighbour graph, with edges drawn between cells with similar feature expression patterns, and then attempts to partition this graph into a number of highly interconnected subepostatus. As LSK cells do not form discrete clusters, but rather show a smooth continuum of expression, our clustering results were highly sensitive to the resolution parameter of Seurats clustering algorithm. Prior to fixing a value of this parameter we determine how sensitive our clustering is to changes in resolution as this will better enable us to interpret clustering based results.
#perform a cluster robustness analysis changing the resolution parameter of the clustering algo
SlamHSCs <- clusterRobustnessAnalysis(SlamHSCs, min_resolution = 0.1, max_resolution = 1)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 3298
## Number of edges: 109795
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9247
## Number of communities: 3
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 3298
## Number of edges: 109795
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8909
## Number of communities: 4
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 3298
## Number of edges: 109795
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8658
## Number of communities: 6
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 3298
## Number of edges: 109795
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8482
## Number of communities: 6
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 3298
## Number of edges: 109795
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8324
## Number of communities: 8
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 3298
## Number of edges: 109795
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8190
## Number of communities: 8
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 3298
## Number of edges: 109795
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8054
## Number of communities: 8
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 3298
## Number of edges: 109795
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7922
## Number of communities: 10
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 3298
## Number of edges: 109795
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7821
## Number of communities: 12
## Elapsed time: 0 seconds
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 3298
## Number of edges: 109795
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7734
## Number of communities: 12
## Elapsed time: 0 seconds
clustree(SlamHSCs, prefix = "RNA_snn_res.",assay = "RNA")
#from our stability analysis we find 3 core clusters that are stable within the dataset
SlamHSCs <- FindClusters(object = SlamHSCs, resolution = 0.1)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 3298
## Number of edges: 109795
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9247
## Number of communities: 3
## Elapsed time: 0 seconds
DimPlot(SlamHSCs)
To facilitate cluster annotation we provide methods that perform differential expression analysis and also overlay the expression on transcriptomic signatures obtained from the wider literature.
markers <- FindAllMarkers(SlamHSCs,only.pos = TRUE,test.use = "LR",return.thresh = 0.05,logfc.threshold = 0.1)#LR and
x <-markers %>% group_by(cluster) %>% top_n(10, avg_logFC)
print(tbl_df(x), n=40)
## # A tibble: 30 x 7
## p_val avg_logFC pct.1 pct.2 p_val_adj cluster gene
## <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <chr>
## 1 1.27e-123 0.517 0.973 0.872 1.38e-119 0 Aldoa
## 2 6.65e- 84 0.569 0.787 0.55 7.26e- 80 0 Pdzk1ip1
## 3 2.26e- 57 0.487 0.698 0.497 2.46e- 53 0 Txnip
## 4 1.94e- 49 0.476 0.579 0.393 2.11e- 45 0 Sqstm1
## 5 2.33e- 48 0.610 0.688 0.525 2.54e- 44 0 Ly6a
## 6 4.29e- 47 0.444 0.841 0.732 4.68e- 43 0 Ctsd
## 7 6.99e- 47 0.469 0.712 0.544 7.62e- 43 0 Hlf
## 8 1.46e- 41 0.473 0.498 0.316 1.59e- 37 0 Gstm1
## 9 1.98e- 35 0.437 0.497 0.33 2.16e- 31 0 Lpar6
## 10 2.17e- 24 0.519 0.167 0.053 2.36e- 20 0 Meg3
## 11 0. 1.07 1 0.999 0. 1 Actb
## 12 7.72e-124 0.827 0.679 0.356 8.42e-120 1 Slc39a1
## 13 6.26e-115 0.884 0.576 0.281 6.83e-111 1 Abhd2
## 14 3.82e- 92 0.790 0.759 0.649 4.16e- 88 1 Myh9
## 15 4.16e- 92 0.780 0.608 0.347 4.54e- 88 1 Notch2
## 16 3.37e- 79 0.721 0.413 0.185 3.67e- 75 1 Fhl3
## 17 1.82e- 60 0.712 0.613 0.47 1.99e- 56 1 Insig1
## 18 2.14e- 58 0.588 0.697 0.563 2.33e- 54 1 Supt16
## 19 6.84e- 56 0.631 0.529 0.379 7.46e- 52 1 Paip1
## 20 7.69e- 48 0.606 0.553 0.448 8.39e- 44 1 Pan3
## 21 0. 1.24 0.948 0.251 0. 2 Top2a
## 22 0. 1.22 0.999 0.806 0. 2 Hmgb2
## 23 1.55e-322 1.22 0.819 0.112 1.67e-318 2 Birc5
## 24 1.60e-284 0.977 0.844 0.149 1.75e-280 2 Spc24
## 25 3.51e-277 0.972 0.764 0.094 3.83e-273 2 Cdca8
## 26 1.01e-257 1.36 0.846 0.274 1.11e-253 2 Hist1h2ap
## 27 2.01e-253 0.987 0.775 0.107 2.19e-249 2 Mki67
## 28 1.53e-250 1.06 0.919 0.385 1.67e-246 2 H2afx
## 29 5.95e-228 0.953 0.664 0.07 6.49e-224 2 Nusap1
## 30 1.36e-145 1.10 0.477 0.055 1.48e-141 2 Ube2c
DoHeatmap(SlamHSCs, features = x$gene,disp.min = -1.5,disp.max = 1.5)
To put our data in the context of the wider literature we overlay signatures obtained from the wider literature
#you can also look at genesets to help annotate the clusters
fp <-"genesets/published_genesets.csv"
gene.set.names <- c("WilsonMolO","Yang_MPP1","Yang_active_HSCs", "Yang_Quiescent_HSCs","Yang.G2M")
SlamHSCs <- customGeneSetScore(SlamHSCs, gene.set.names, gene.set.fp = fp)
p1 <- VlnPlot(object = SlamHSCs, features = "WilsonMolO1")
p2 <- VlnPlot(object = SlamHSCs, features = "Yang_active_HSCs1")
p3 <- VlnPlot(object = SlamHSCs, features = "Yang_Quiescent_HSCs1")
p4 <- VlnPlot(object = SlamHSCs, features = "Yang.G2M1")
multiplot(p1,p2,p3,p4, cols = 2)
Compare Ctrl vs EPO using DEG approach.
To understand between key differences between our clusters, we perform a differential expression analysis. It is important to note that different approaches for differential expression analysis of single cells rely on different assumptions about that data, and consequently can give very different results. In this analysis we use a logistic regression approach.
In the approach below we employ a logistic regression framework to determine differentially expressed genes. Specifically, we construct a logistic regression model predicting group membership based on each feature individually and compares this to a null model with a likelihood ratio test. This approach is advantageous for the analysis of HSPCs which have an expression profile that is distinct from mature cell subsets. Typically, expression profiles are bimodal, and changes in the magnitude of expression are subtle. In other datasets we have noted that cells have similar expression magnitudes but that different proportions of cells are positive for a given gene, within a given group. Given these unique features of our data, we posit that logistic regression is well suited to performing differential expression analysis of our data.
#set condition as the identity that
Idents(SlamHSCs) <- SlamHSCs@meta.data$condition
#run the DE analysis
markers <- FindMarkers(SlamHSCs,test.use = "LR",logfc.threshold = 0.001, ident.1 = "EPO_treated")
volcanoPlot(markers)
We see a number of differentially expressed genes between the groups, however how many genes would be expect to be different, just due to chance? We can address this question using permutation testing.
#the number of permutations that you would like to perform
permutation.test <- sapply(1:500, performDEA)
#this histogram shows how many DEGs we would expect just due to chance.
hist(permutation.test)
#how many DEGs do we get with the real comparison
Idents(SlamHSCs) <- SlamHSCs@meta.data$condition
markers <- FindAllMarkers(SlamHSCs, verbose = F,
test.use="LR",logfc.threshold = 0.05,
only.pos=TRUE,return.thresh = 0.05)
Based on our differentially expressed genes, are all genes in the EPO group driving these molecular differences or just a subset. To address this question we generate a composite score for all of the genes enriched in the EPO group and plot this composite score at the single cell level
#filter the signature to take only genes that are significantly enriched in the EPO treated group
EPO.sig.filtered <- markers[markers$p_val_adj < 0.05 & markers$avg_logFC > 0,]
#create a composite score for all of these genes and overlay expression onto our UMAP visualisation
SlamHSCs <- AddModuleScore(SlamHSCs, features = list(rownames(EPO.sig.filtered)),name = "EPO")
FeaturePlot(SlamHSCs, features = "EPO1", min.cutoff = "q3", max.cutoff = "q97", pt.size = 2, cols = c("lightgrey","red"))
#cassign a threhsold for the expressin of our composite score and call all cells above this threshold EPOresponsive
SlamHSCs@meta.data$eporesponder <- SlamHSCs@meta.data$EPO1 > quantile(SlamHSCs@meta.data$EPO1 ,c(0.9))
SlamHSCs@meta.data$epostatus <- SlamHSCs@meta.data$condition
levels(SlamHSCs@meta.data$epostatus) <- c(levels(SlamHSCs@meta.data$condition), "EPO_responsive", "EPO_nonresponsive")
SlamHSCs@meta.data$epostatus[SlamHSCs@meta.data$eporesponder == TRUE] <- "EPO_responsive"
SlamHSCs@meta.data$epostatus[SlamHSCs@meta.data$epostatus == "EPO_treated"] <- "EPO_nonresponsive"
#overlay this classification onto our UMAP
DimPlot(SlamHSCs, group.by = "epostatus")
#Find markers for our 3 groups of interest
Idents(SlamHSCs) <- SlamHSCs@meta.data$epostatus
markers <- FindAllMarkers(SlamHSCs, verbose = F,
test.use="LR",logfc.threshold = 0.05,
only.pos=TRUE,return.thresh = 0.05)
#see whether epo status is related to cell cycle.
counts <- table(SlamHSCs@meta.data$phases, SlamHSCs@meta.data$epostatus)
prop.df <- data.frame(cbind(data.frame(prop.table(counts[,1])),data.frame(prop.table(counts[,3])) , data.frame(prop.table(counts[,4]))))
colnames(prop.df) <- c("Control","EPO_responder", "EPO_nonresponder")
barplot(as.matrix(prop.df), beside = TRUE, legend = rownames(counts),col = hue_pal()(length(unique(SlamHSCs@meta.data$phases))))#gfpneg
To annotate our cells in a supervised way, we take a published reference dataset of 44,802 cKit cells generated by 10X (Dahlin et al. 2018) and map our cells onto this reference map. For each cell of the query dataset we determining the k-nearest neighbours in the reference dataset in PCA space. The mean umap coordinates of the 10 closest neighbours in the reference data are then taken as the new UMAP coordinates for the query cells.
## integer(0)
## integer(0)
## integer(0)
## integer(0)
## integer(0)
## integer(0)
## integer(0)
Butler, Andrew et al. Integrating Single-Cell Transcriptomic Data across Different Conditions, Technologies, and Species. Nature Biotechnology 36, no. 5 (May 2018): 411???20. https://doi.org/10.1038/nbt.4096.
Fan, Jean et al. Characterizing Transcriptional Heterogeneity through Pathway and Gene Set Overdispersion Analysis. Nature Methods 13, no. 3 (March 2016): 241???44. https://doi.org/10.1038/nmeth.3734.
Kharchenko, Peter V., Lev Silberstein, and David T. Scadden. Bayesian Approach to Single-Cell Differential Expression Analysis. Nature Methods 11, no. 7 (July 2014): 740???42. https://doi.org/10.1038/nmeth.2967.
Scialdone, Antonio et al. Computational Assignment of Cell-Cycle Stage from Single-Cell Transcriptome Data. Methods (San Diego, Calif.) 85 (September 1, 2015): 54???61. https://doi.org/10.1016/j.ymeth.2015.06.021.
Lun A, Risso D (2018). SingleCellExperiment: S4 Classes for Single Cell Data. R package version 1.4.0.
Yang, Jennifer et al. Single Cell Transcriptomics Reveals Unanticipated Features of Early Hematopoietic Precursors. Nucleic Acids Research 45, no. 3 (17 2017): 1281???96. https://doi.org/10.1093/nar/gkw1214.
Giladi, Amir et al. Single-Cell Characterization of Haematopoietic Progenitors and Their Trajectories in Homeostasis and Perturbed Haematopoiesis. Nature Cell Biology 20, no. 7 (July 2018): 836???46. https://doi.org/10.1038/s41556-018-0121-4.
Grover, Amit et al. Erythropoietin Guides Multipotent Hematopoietic Progenitor Cells toward an Erythroid Fate. Journal of Experimental Medicine 211, no. 2 (February 10, 2014): 181???88. https://doi.org/10.1084/jem.20131189.
Tusi, Betsabeh Khoramian et al. Population Snapshots Predict Early Hematopoietic and Erythroid Hierarchies. Nature 555, no. 7694 (March 1, 2018): 54???60. https://doi.org/10.1038/nature25741.
Singh, Rashim Pal et al. Hematopoietic Stem Cells but Not Multipotent Progenitors Drive Erythropoiesis during Chronic Erythroid Stress in EPO Transgenic Mice. Stem Cell Reports 10, no. 6 (June 5, 2018): 1908???19. https://doi.org/10.1016/j.stemcr.2018.04.012.
AlJanahi, Aisha A., Mark Danielsen, and Cynthia E. Dunbar. ‘An Introduction to the Analysis of Single-Cell RNA-Sequencing Data’. Molecular Therapy - Methods & Clinical Development 10 (21 September 2018): 189–96. https://doi.org/10.1016/j.omtm.2018.07.003.
Dahlin, Joakim S., Fiona K. Hamey, Blanca Pijuan-Sala, Mairi Shepherd, Winnie W. Y. Lau, Sonia Nestorowa, Caleb Weinreb, et al. “A Single-Cell Hematopoietic Landscape Resolves 8 Lineage Trajectories and Defects in Kit Mutant Mice.” Blood 131, no. 21 (May 24, 2018): e1–11. https://doi.org/10.1182/blood-2017-12-821413.